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Abstract 

Recently, sparsity has become a key concept in various areas of applied mathematics, 
computer science, and electrical engineering. One application of this novel methodology 
is the separation of data, which is composed of two (or more) morphologically distinct 
constituents. The key idea is to carefully select representation systems each providing 
sparse approximations of one of the components. Then the sparsest coefficient vector 
representing the data within the composed - and therefore highly redundant ~ repre- 
sentation system is computed by £i minimization or thresholding. This automatically 
enforces separation. 

This paper shall serve as an introduction to and a survey about this exciting area 
of research as well as a reference for the state-of-the-art of this research field. 
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1 Introduction 



Over the last years, scientists face an ever growing deluge of data, which needs to be 
transmitted, analyzed, and stored. A close analysis reveals that most of these data might 
be classified as multimodal data, i.e., being composed of distinct subcomponents. Prominent 
examples are audio data, which might consist of a superposition of the sounds of different 
instruments, or imaging data from neurobiology, which is typically a composition of the 
soma of a neuron, its dendrites, and its spines. In both these exemplary situations, the 
data has to be separated into appropriate single components for further analysis. In the 
first case, separating the audio signal into the signals of the different instruments is a first 
step to enable the audio technician to obtain a musical score from a recording. In the 
second case, the neurobiologist might aim to analyze the structure of dendrites and spines 
separately for the study of Alzheimer specific characteristics. Thus data separation is often 
a crucial step in the analysis of data. 

As a scientist, three fundamental problems immediately come to one's mind: 

(PI) What is a mathematically precise meaning of the vague term 'distinct components'? 

(P2) How do we separate data algorithmically? 

(P3) When is separation possible at all? 

To answer those questions, we need to first understand the key problem in data separation. 
In a very simplistic view, the essence of the problem is as follows: Given a composed signal 
X of the form x = xi + X2, we aim to extract the unknown components xi and X2 from it. 
Having one known data and two unknowns obviously makes this problem underdetermined. 
Thus, the novel paradigm of sparsity ~ appropriately utilized - seems a perfect fit for 
attacking data separation, and this chapter shall serve as both an introduction into this 
intriguing application of sparse representations as well as a reference for the state-of-the-art 
of this research area. 

1.1 Morphological Component Analysis 

Intriguingly, when considering the history of Compressed Sensing, the first mathematically 
precise result on recovery of sparse vectors by ii minimization is related to a data separation 
problem: The separation of sinusoids and spikes in [16\ 111] . Thus it might be considered 
a milestone in the development of Compressed Sensing. In addition, it reveals a surprising 
connection with uncertainty principles. 

The general idea allowing separation in |161lll] was to choose two bases or frames and 
^2 adapted to the two components to be separated in such a way that <I>i and ^2 provide a 
sparse representation for xi and X2, respectively. Searching for the sparsest representation of 
the signal in the combined (highly over complete) dictionary [$1 1 <I>2] should then intuitively 
enforce separation provided that xi does not have a sparse representation in $2 and that X2 
does not have a sparse representation in $1. This general concept was later - in the context 
of image separation, but the term seems to be fitting in general - coined Morphological 
Component Analysis [36\ . 
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This viewpoint now measures the morphological difference between components in terms 
of the incoherence of suitable sparsifying bases or frames thereby giving one possible 
answer to (PI); see also the respective chapters in the novel book |33J- One possibility 
for measuring incoherence is the mutual coherence. We will however see in the sequel that 
there exist even more appropriate coherence notions, which provide a much more refined 
measurement of incoherence specifically adapted to measuring morphological difference. 

1.2 Separation Algorithms 

Going again back in time, we observe that far before [llj . Coifman, Wickerhauser, and 
co-workers already presented very inspiring empirical results on the separation of image 
components using the idea of Morphological Component Analysis, see [7j. After this, several 
techniques to actually compute the sparsest expansion in a composed dictionary [ <I>i | ^2] 
were introduced. In [31j . Mallat and Zhang developed Matching Pursuit as one possible 
methodology. The study by Chen, Donoho, and Saunders in [6] then revealed that the ii 
norm has a tendency to find sparse solutions when they exist, and coined this method Basis 
Pursuit. 

As explained before, data separation by Morphological Component Analysis - when 
suitably applied - can be reduced to a sparse recovery problem. To solve this problem, there 
nowadays already exist a variety of utilizable algorithmic approaches; thereby providing a 
general answer to (P2). Such approaches include, for instance, a canon of greedy- type 
algorithms. Most of the theoretical separation results however consider ii minimization as 
the main separation technique, which is what we will also mainly focus on in this chapter. 

1.3 Separation Results 

As already mentioned, the first mathematically precise result was derived in [11] and solved 
the problem of separation of sinusoids and spikes. After this 'birth of sparse data separation', 
a deluge of very exciting results started. One direction of research are general results on 
sparse recovery and Compressed Sensing; here we would like to cite the excellent survey 
paper [Ij. 

Another direction continued the idea of sparse data separation initiated in [TT]. In 
this realm, the most significant theoretical results might be considered firstly the series of 
papers [19\ ITOj . in which the initial results from are extended to general composed 
dictionaries, secondly the paper [23j, which also extends results from [11] though with a 
different perspective, and thirdly the papers [3] and [TJ], which explore the clustering of the 
sparse coefficients and the morphological difference of the components encoded in it. 

We also wish to mention the abundance of empirical work showing that utilizing the 
idea of sparse data separation often gives very compelling results in practice, as examples, 
we refer to the series of papers on applications to astronomical data [21 [361 [33], to general 
imaging data [32l [20l |35] , and to audio data [22l [25] . 

Let us remark that also the classical problem of denoising can be regarded as a separation 
problem, since we aim to separate a signal from noise by utilizing the characteristics of the 
signal family and the noise. However, as opposed to the separation problems discussed in 
this chapter, denoising is not a 'symmetric' separation task, since the characterization of 
the signal and the noise are very different. 
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1.4 Design of Sparse Dictionaries 

For satisfactorily answering (P3), one must also raise the question of how to find suitable 
sparsifying bases or frames for given components. This search for 'good' systems in the sense 
of sparse dictionaries can be attacked in two ways, either non-adaptively or adaptively. 

The first path explores the structure of the component one would like to extract, for 
instance, it could be periodic such as sinusoids or anisotropic such as edges in images. 
This typically allows one to find a suitable system among the already very well explored 
representation systems such as the Fourier basis, wavelets, or shearlets, to name a few. The 
advantage of this approach is the already explored structure of the system, which can hence 
be exploited for deriving theoretical results on the accuracy of separation, and the speed of 
associated transforms. 

The second path uses a training set of data similar to the to-be-extracted component, 
and 'learns' a system which best sparsifies this data set. Using this approach customarily 
referred to as dictionary learning, we obtain a system extremely well adapted to the data 
at hand; as the state-of-the-art we would like to mention the K-SVD algorithm introduced 
by Aahron, Elad, and Bruckstein in p!]; see also [l7j for a 'Compressed Sensing' perspective 
to K-SVD. Another appealing dictionary training algorithm, which should be cited is the 
method of optimal directions (MOD) by Engan et al. [21]. The downside however is the 
lack of a mathematically exploitable structure, which makes a theoretical analysis of the 
accuracy of separation using such a system very hard. 

1.5 Outline 

In Section O we discuss the formal mathematical setting of the problem, present the nowa- 
days already considered classical separation results, and then discuss more recent results 
exploiting the clustering of significant coefficients in the expansions of the components as 
a means to measure their morphological difference. We conclude this section by reveal- 
ing a close link of data separation to uncertainty principles. Section [3] is then devoted to 
both theoretical results as well as applications for separation of ID signals, elaborating, 
in particular, on the separation of sinusoids and spikes. Finally, Section U] focuses on di- 
verse questions concerning separation of 2D signals, i.e., images, such as the separation of 
point- and curvelike objects, again presenting both application aspects as well as theoretical 
results. 

2 Separation Estimates 

As already mentioned in the introduction, data separation can be regarded within the 
framework of underdetermined problems. In this section, we make this link mathematically 
precise. Then we discuss general estimates on the separability of composed data, firstly 
without any knowledge of the geometric structure of sparsity patterns, and secondly, by 
taking known geometric information into account. A revelation of the close relation with 
uncertainty principles concludes the section. 

In Sections [3] and [U we will then see the presented general results and uncertainty 
principles in action, i.e., applied to real- world separation problems. 
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2.1 Relation with Underdetermined Problems 



Let X be our signal of interest, which we for now consider as belonging to some Hilbert 
space H, and assume that 

Certainly, real data is typically composed of multiple components, hence not only the sit- 
uation of two components, but three or more is of interest. We will however focus on the 
two-component situation to clarify the fundamental principles behind the success of sepa- 
rating those by sparsity methodologies. It should be mentioned though that, in fact, most 
of the presented theoretical results can be extended to the multiple component situation in 
a more or less straightforward manner. 

To extract the two components from x, we need to assume that - although we are not 
given and X2 - certain 'characteristics' of those components are known to us. Such 
'characteristics' might be, for instance, the pointlike structure of stars and the curvelike 
structure of filaments in astronomical imaging. This knowledge now enables us to choose 
two representation systems, $1 and ^2, say, which allow sparse expansions of x^ and X2, 
respectively. Such representation systems might be chosen from the collection of well- 
known systems such as wavelets. A different possibility is to choose adaptively the systems 
via dictionary learning procedures. This approach however requires training data sets for 
the two components x^ and X2 SI'S discussed in Subsection 11.41 

Given now two such representation systems $1 and $21 we can write x as 



X-y ~\~ X<2 



^ic\ + $2C2 



[$1 I $2] 



^2 



with 



^illo 



and 



=2ll0 



'sufficiently small'. Thus, the data separation problem has been 



reduced to solving the underdetermined linear system 



x = [$i I $2] 



Cl 
C2 



(1) 



for [ci,C2]"^. Unique recovery of the original vector [c\,C2^ automatically extracts the 
correct two components x\ and x^ from x, since 

x\ = $ic? and x^ = $2C2. 

Ideally, one might want to solve 



min ||ci||o + IIC2II0 s.t. x = [<I>i|$2 



Cl 
C2 



(2) 



which however is an NP-hard problem. Instead one aims to solve the £1 minimization 
problem 

'" Cl 



(SepJ 



min ||ci 111 + ||c2 111 s.t. x = [ <I>i | <I>2 

Cl,C2 



C2 



(3) 



The lower case 's' in Sep^ indicates that the £1 norm is placed on the synthesis side. Other 
choices for separation are, for instance, greedy-type algorithms. In this chapter we will focus 
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on li minimization as the separation technique, consistent with most known separation 
results from the hterature. 

Before discussing conditions on [cj, 02]'^ and [^i \ ^2], which guarantee unique solvability 
of ([1]), let us for a moment debate whether uniqueness is necessary at all. If $1 and ^2 
form bases, it is certainly essential to recover [c5,C2]"^ uniquely from ([T]). However, some 
well-known representation systems are in fact redundant and typically constitute Parseval 
frames such as curvelets or shearlets. Also, systems generated by dictionary learning are 
normally highly redundant. In this situation, for each possible separation 

X = Xi+X2, (4) 

there exist infinitely many coefficient sequences [ci,C2]"^ satisfying 

xi = ^ici and X2 = ^2C2- (5) 

Since we are only interested in the correct separation and not in computing the sparsest 
expansion, we can circumvent presumably arising numerical instabilities when solving the 
minimization problem ^ by selecting a particular coefficient sequence for each separation. 
Assuming $1 and $2 are Parseval frames, we can exploit this structure and rewrite ^ as 

xi = ^i{^Jxi) and X2 = $2(^2X2). 

Thus, for each separation Q, we choose a specific coefficient sequence when expanding the 
components in the Parseval frames, in fact, we choose the analysis sequence. This leads 
to the following different ii minimization problem in which the ii norm is placed on the 
analysis rather than the synthesis side: 

(Sep^) min ||<I>f xi 111 + ||<I>^X2||i s.t. x = xi + X2- (6) 

Xl,X2 

This new minimization problem can be also regarded as a mixed £1-^2 problem, since 
the analysis coefficient sequence is exactly the coefficient sequence which is minimal in the 
£2 norm. 



2.2 General Separation Estimates 

Let us now discuss the main results of successful data separation, i.e., stating conditions on 
[ci, €2]^ and [ $1 I $2 ] for extracting and X2 from x. The strongest known general result 
was derived in 2003 by Donoho and Elad [TO] and used the notion of mutual coherence. 
Recall that, for a normalized frame ^ = {(pi)i£j, the mutual coherence of ^> is defined by 

^($) = max \{ipi,(pj)\. 

The result states the following. 

Theorem 2.1 ( jlOj ) Let <I>i and ^2 be two frames for a Hilbert space %, and let x £ Ti, 
X / 0. Ifx= [<I>i|<I>2]c and 

then the solution of the i\ minimization problem {Sepg) stated in ([3]) coincides with the 
solution of the Iq minimization problem stated in ([2]) . 
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Before presenting the proof, we require some prerequisites. Firstly, we need to introduce 
the so-called nullspace property. 

Definition 2.2 Let ^ = {ipi)i^j be a frame for a Hilbert space %, and letj\f{^) denote the 
null space of ^. Then ^ is said to have the null space property of order k if 

IIMlK^lldlli 

for all d G M{^) \ {0} and for all sets AC/ with |A| < k. 

This notion provides a very useful characterization of the existence of unique sparse 
solutions of the ii minimization problem (Sep^) stated in ([3]). 

Lemma 2.3 Let ^ = {{pi)i^j be a frame for a Hilbert space %, and let x £ H. Then the 
following conditions are equivalent. 

(i) All vectors c with \\c\\o < k are unique solutions of the £i minimization problem (Sepg) 
stated in ([3]) (with ^ instead of [^i\^2])- 

(ii) <I> satisfies the null space property of order k. 

Proof. First, assume that (i) holds. Let d e N{(^) \ {0} and AC/ with |A| < be 
arbitrary. Then, by (i), the sparse vector l\d is the unique minimizer of ||c||i subject to 
$c = ^>(1ac?)- Further, since d £ AA($) \ {0}, 

$(-lAcd) = $(lAd). 

Hence 

llUdlll < ||lAcd||i, 

or, in other words, 

IIMlK^Nii, 

which implies (ii), since d and A were chosen arbitrarily. 

Secondly, assume that (ii) holds, and let ci be a vector with ||ci||o < k and support 
denoted by A. Further, let C2 be an arbitrary solution of x = $c, and set 

d = C2- Ci. 

Then 

||c2||l - ||ci||l = ||1a=C2||i + ||1aC2||i - ||1aCi||i > ||1a<=C?||i - ||lAf^||l- 

This term is greater than zero for any d 7^ if 

||lA-d||l > ||lArf||l, 

or 

^NIIi>IIMIi. 

This is ensured by (ii). Hence ||c2||i > ||ci||i, and thus ci is the unique solution of (Sep^). 
This implies (i). □ 

Using this result, we next prove that a solution satisfying ||c||o < f (^1 + Ji^^ the 
unique solution of the ii minimization problem (Sep^). 
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Lemma 2.4 Let ^ = ((^i)ig/ be a frame for a Hilbert space %, and let x £ Ti. If c is a 
solution of the ii minimization problem (Sepg) stated in ([3]) (with <I> instead of [^i\^2]) oind 
satisfies 

l|c||o < ^ ( 1 + 



then it is the unique solution. 
Proof. Let d £ M{^) \ {0}, hence, in particular, 

= 0; 

thus also 

= 0. (7) 

Without loss of generality, we now assume that the vectors in <I> are normalized. Then, ([7]) 
implies that, for all i £ I, 

di = -'^{(pi,(pj)dj. 
Using the definition of mutual coherence fi{^) (cf. Subsection I2.2p . we obtain 

\di\ <Y^\{^,,^j)\ . \dj\ < ^{^){\\d\\i - \d^\), 



and hence 

\di\ < ( 1 + 



1 



Thus, by the hypothesis on ||c||o and for any AC/ with |A| = ||c||o, we have 



1 \-' / 1 \-' 1 



IIU<i|k<|A|-(^i + ^j N|, = M„.^^i + — j ||*<^,_. 

This shows that $ satisfies the null space property of order ||c||o, which, by Lemma |2.3^ 
implies that c is the unique solution of (Sep^). □ 

We further prove that a solution satisfying ||c||o < ^ (^1 + also the unique 

solution of the £o-™iiiimization problem. 

Lemma 2.5 Let <1> = {ipi)i^j be a frame for a Hilbert space %, and let x £ %. If c is 
a solution of the Iq minimization problem stated in ([2]) (with $ instead of [^i\^2]) o,nd 
satisfies 

^ ( 1 

C o < 1 + 



then it is the unique solution. 
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Proof. By Lemma 12.41 the hypotheses imply that c is the unique solution of the £i min- 
imization problem (Sepg). Now, towards a contradiction, assume that there exists some c 
satisfying x = $c with ||c||o < ||c||o- Then c must satisfy 

Again, by Lemma 12.41 c is the unique solution of the ii minimization problem (Sep^), a 
contradiction. □ 

These lemmata now immediately imply Theorem 12.11 

Proof [Proof of Theorem 12.1] . Theorem 12.11 follows from Lemmata 12.41 and 12.51 □ 

Literestingly, in the situation of and ^2 being two orthonormal bases the bound can 
be slightly strengthened. For the proof of this result, we refer the reader to [19j . 

Theorem 2.6 ( [19] ) Let <I>i and ^2 be two orthonormal bases for a Hilbert space %, and 
let X eH. If x = [$i|$2]c and 

^/2- 0.5 

then the solution of the ii minimization problem (Sepg) stated in ([3]) coincides with the 
solution of the Iq minimization problem stated in ([2]) . 

This shows that in the special situation of two orthonormal bases, the bound is nearly 
a factor of 2 stronger than in the general situation of Theorem 12.11 



2.3 Clustered Sparsity as a Novel Viewpoint 

In a concrete situation, we often have more information on the geometry of the to-be- 
separated components Xi and Xg. This information is typically encoded in a particular 
clustering of the non-zero coefficients if a suitable basis or frame for the expansion of x^ 
or X2 is chosen. Think, for instance, of the tree clustering of wavelet coefficients of a point 
singularity. Thus, it seems conceivable that the morphological difference is encoded not 
only in the incoherence of the two chosen bases or frames adapted to x^ and X2, but in the 
interaction of the elements of those bases or frames associated with the clusters of significant 
coefficients. This should intuitively allow for weaker necessary conditions for separation. 

One possibility for a notion capturing this idea is the so-called joint concentration which 
was introduced in [14] with concepts going back to [16], and was in between again revived 
in [11]. To provide some intuition for this notion, let Ai and A2 be subsets of indexing 
sets of two Parseval frames. Then the joint concentration measures the maximal fraction 
of the total ii norm which can be concentrated on the index set Ai U A2 of the combined 
dictionary. 

Definition 2.7 Let $1 = {(pii)i^i and $2 = i'^2j)jeJ be two Parseval frames for a Hilbert 
spaceH.. Further, let Ai C / and A2 ^ J. Then i/ie joint concentration n =k(Ai, <I>i; A2, $2) 
is defined by 

/A ^ A ^ N lllAi^TxIli + ||1a2^>|'x||i 

k(Ai,$i; A2,^>2) = sup- — Lt..ii , Lt 



X 



x\\i + W^i x\\ 
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One might ask how the notion of joint concentration relates to the widely exploited, 
and for the previous result utilized mutual coherence. For this, we first briefly discuss some 
derivations of mutual coherence. A first variant better adapted to clustering of coefficients 
was the Babel function first introduced in [10] and later in j37j under the label cumulative 
coherence function, which, for a normalized frame <I> = {ipi)i^j and some m G {1, . . . , |/|} is 
defined by 

HB{m,^)= max max V | (99^ , 99^ ) | . 

Ac/,|A|=m j^A ^-^ 

This notion was later refined in [3J by considering the so-called structured p-Babel function, 
defined for some family S of subsets of / and some 1 < p < 00 by 



fisB{S, $) = max max V |(v3i, ipj)\P 



Another variant, better adapted to data separation, is the cluster coherence introduced in 
|14j . whose definition we now formally state. Notice that we do not assume that the vectors 
are normalized. 

Definition 2.8 Let $1 = (<^u)iG/ f^'^c? $2 = i^2j)jGJ be two Parseval frames for a Hilbert 
space %, let Ai C /, and let A2 C J. Then the cluster coherence /Uc(Ai, <I>i; $2) of (^i and 
$2 with respect to Ai is defined by 

/Uc(Ai,$i;$2) = max \ {ipii,ip2j)\, 

ieAi 

and the cluster coherence fj,c{^i',A2,^2) of and ^2 with respect to A2 is defined by 

^cl^^i; A2,$2) = max V \{(pii,(p2j)\. 
iGi ^ — ' 

jGAa 

The relation between joint concentration and cluster coherence is made precise in the 
following result from [14j . 



Proposition 2.9 ( |14| ) Let <I>i = ((/?ij)jg7 and ^2 = {^2j)jeJ be two Parseval frames for 
a Hilbert space %, and let Ai C / and A2 ^ J. Then 

fi;(Ai,«>i; A2,$2) < max{/ic(Ai, $1; ^>2), A2, «>2)}. 

Proof. Let x £ T-L. We now choose coefficient sequences ci and C2 such that 

X = ^iCi = $2C2 

and, for i = 1,2, 

\\ci\\i ^ \\di\\i for all di with x = ^idi. (8) 

This implies that 

IIIai^'TxIIi + ||lA2$f'x||i 
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|lA,$f$2C2||l + ||1a2^J^1Ci||i 



ieAi \jeJ I jeA2 Vie/ / 



/ i&I \ieA2 / 



< /Uc(Ai,$i;$2)||c2||l +/Uc(A2,^>2;^l)||ci||l 

< max{/Xc(Ai,^>i;$2),/Uc(A2,$2;^'i)}(||ci||i + ||c2||i). 
Since <I>i and ^2 are Parseval frames, we have 

X = <^i{^f<!>iCi) fori = 1,2. 

Hence, by exploiting 

lllAi^fa^lli + llUa^'^a^lli 
< max{;Uc(Ai,^>i;$2),Mc(A2,$2;^'i)}(||^f^ici||i + ||«'i'^>2C2||i 
= max{;Uc(Ai,$i;$2),/Uc(A2,$2;^i)}(||^r2;||i + ll^^xlli). □ 



Before stating the data separation estimate which uses joint concentration, we need to 
discuss the conditions on sparsity of the components in the two Parseval frames. Since for 
real data 'true sparsity' is unrealistic, a weaker condition will be imposed. For the next 
result, a notion invoking the clustering of the significant coefficients will be required. This 
notion, first utilized in [9], is defined for our data separation problem as follows. 

Definition 2.10 Let $1 = and $2 = if2j)j£j be two Parseval frames for a Hilhert 

space 7i, and let Ai C / and A2 C J. Further, suppose that x £ H can be decomposed as 
X = + Then the components x^ and X2 are called (5-relatively sparse in $1 and <I>2 
with respect to Ai and A2, if 

\\Uc<^Jx% + \\1ac^^x% < 6. 

We now have all ingredients to state the data separation result from [14J, which - as 
compared to Theorem 12. II - now invokes information about the clustering of coefficients. 

Theorem 2.11 ([14j) Let <I>i = (ipii)i^i and $2 = {^2j)j£j be two Parseval frames for a 
Hilhert space T-L, and suppose that x £ H can be decomposed as x = x\ + X2- Further, let 
Ai C / and A2 C J 6e chosen such that x\ and x^ are 5-relatively sparse in $1 and $2 
with respect to Ai and A2. Then the solution {x\, x\) of the ii minimization problem {Sep^) 
stated in ([6]) satisfies 

26 



\x1-X% + 1^2-2^2112 < 



1 -2k 
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Proof. First, using the fact that $i and $2 are Parseval frames, 

\\xl-X% + M-X% = II^T(^t-^S)ll2 + ||^^(x^-X°)||2 

The decomposition Xi + X2 = x = x\ + X2 imphes 

X2 ^2 — \Xi Xi), 

which allows us to conclude that 

- x% + \\x^- x% < (xl - + W^^ixt - xDh. (9) 
By the definition of k, 

= (iiiA,$f (xi - x?)iii + wu^^iixt - x?)iii) + wuc^Tixt - 

+ l|lAi*^(^2-^2)l|l 

< AC • K - X?)!!! + ll^i^K - X?)||l) + \\lAc^J{xt - X?)||l 

which yields 

||$f(a;^-x?)|K + ||cI>l^(x^-x?)||i 

< ^(||Uc$f (xt - x?)||i + ||1ac$^(x^ - xO)||i) 

< -^(lllAj^fxtlli + ||lAc$fxO||i + ||lA^$i^X^||i + \\1^^'^X%). 

Now using the relative sparsity of x? and X2 in $1 and $2 with respect to Ai and A2, we 
obtain 

- x?)lli + \\^^ixt - x?)||i < (lliAf^fxJiii + llUi^^x^lli + s) . (10) 

By the minimality of x\ and X2 as solutions of (Sep„) implying that 

2 

^(||lAc$fx^||i + ||lA,^fx*||i) = II^Rlli + ll^l^X^lli 

1=1 

< II $fx° 111 + 11 $1^x0 111, 

we have 

iiiA5frxtiii+iiiAi<5rx^iii 

< ||$rx?||i + ||<l>l^xO||i - ||lAi$ra;t||i - lllAa^-^^r^lli 

< ||$fx?||i + ||<l>i^xO||i + WU.^Jixl - x?)||i - ||lA,$rx?||i 
+ ||lA,$i^(x^-xO)||i-||lA,$i^xO||i. 
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Again exploiting relative sparsity leads to 



pAj^f x^lli + lllA^^-^x^lli < llUi^fK - x?)||i + \\1a,^^{x*2 - xDWi + 6. (11) 
Combining (jlOp and (jlip and again using joint concentration, 



1 - K 

Thus, by ([9]), we finally obtain 



|$fK-rr?)||i + ||ci>l^K-x?)||i 
< [||lA,f f K - ^?)lli + PA.^^K - ^?)lli + 25] 

-L K 



K--!lb + ll4-4lb<(^i-— j T—.'—. ° 

Using Proposition 12. 9^ this result can also be stated in terms of cluster coherence, which 
on the one hand provides an easier accessible estimate and allows a better comparison with 
results using mutual coherence, but on the other hand poses a slightly weaker estimate. 

Theorem 2.12 ([T4j) Let <I>i = {ipii)i^j and ^2 = {^2j)jeJ be two Parseval frames for a 
Hubert space H, and suppose that x G H can be decomposed as x = x^ + X2- Further, let 
Ai C / and A2 ^ J be chosen such that and X2 are 6 -relatively sparse in $1 and ^2 with 
respect to Ai and A2. Then the solution {x\,X2) of the minimization problem [Sepi^) stated 
in ([6]) satisfies 

IFi 2;;^ ||2 -t- 11X2 3;2||2 ^ ^ 2^ ' 

with 

//c = max{;Uc(Ai, $1; $2), /«c($i; A2, ^>2)}- 

To thoroughly understand this estimate, it is important to notice that both relative 
sparsity 5 as well as cluster coherence fic depend heavily on the choice of the sets of signifi- 
cant coefficients Ai and A2. Choosing those sets too large allows for a very small 5, however 
//c might not be less than ^ anymore, thereby making the estimate useless. Choosing those 
sets too small will force /Xc to become simultaneously small, in particular, smaller than i, 
with the downside that 5 might be large. 

It is also essential to realize that the sets Ai and A2 are a mere analysis tool; they do 
not appear in the minimization problem (Sep^j). This means that the algorithm does not 
care about this choice at all, however the estimate for accuracy of separation does. 

Also note that this result can be easily generalized to general frames instead of Parseval 
frames, which then changes the separation estimate by invoking the lower frame bound. In 
addition, a version including noise was derived in [H 
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2.4 Relation with Uncertainty Principles 



Intriguingly, there exists a very close connection between uncertainty principles and data 
separation problems. Given a signal x ^ % and two bases or frames and ^>2) loosely 
speaking, an uncertainty principle states that x cannot be sparsely represented by $i and 
$2 simultaneously; one of the expansions is always not sparse unless x = 0. For the relation 
to the 'classical' uncertainty principle, we refer to Subsection 13. li 

The first result making this uncertainty viewpoint precise was proven in |19j with ideas 
already lurking in p!6] and [11]. Again, it turns out that the mutual coherence is an appro- 
priate measure for allowed sparsity, here serving as a lower bound for the simultaneously 
achievable sparsity of two expansions. 

Theorem 2.13 ([19]) Let and ^2 be two orthonormal bases for a Hilbert space %, and 
let X £ T-l, X ^ 0. Then 

||$fx||o + ll^i'xllo > 



/U([$i|$2])' 



Proof. First, let = {ipii)i^i and ^>2 = {(p2j)j<^,j- Further, let Ai C I and A2 C J denote 
the support of ^J^x and ^^x, respectively. Since x = <&i$^x, for each j G J, 



I (^2^).- 1 



^{<^1x)i{ipii,ip2j) 



Since and $2 are orthonormal bases, we have 



|x||2 = ||^'fx||2 = ||^'2'x||2. 



(12) 



(13) 



Using in addition the Cauchy-Schwarz inequality, we can continue (|12p by 



\{^lx),\^ <\\^lx\\l 



< ||x||i-|Ai|-;u([$i|$2])'. 



\{fli,f2j)(' 

This implies 

w^lxh = I Yl \i^2xhn < ikib- viAii •IA2I •M['^'ii^2]). 
\ieA2 / 

Since jAjj = ||^>J'x||o, i = 1,2, and again using (fT3|) . we obtain 



|^>f2;||o • \\<^^x\\o > 



/x([«>i|$2])' 



Using the geometric-algebraic relationship, 

i(ii$f xiio + ii$jxiio) > v^^rSrSfS > 

which proves the claim. □ 

This result can be easily connected to the problem of simultaneously sparse expansions. 
The following version was first explicitly stated in [4j. 
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Theorem 2.14 ([4]) Let <I>i and $2 be two orthonormal bases for a Hilbert space %, and 
letx€z7i,x^ 0. Then, for any two distinct coefficient sequences Ci satisfying x = [<I>i|$2]ci; 
i = 1,2, we have 

2 

/^([$l|$2]) 

Proof. First, set d = ci — 02 and partition d into [d^-^, d^^J^ such that 

= [<I>i|<I>2](i = ^idis>^ + ^2d<s,2- 
Since <I>i and $2 are bases and d 7^ 0, the vector y defined by 

y = = -$2d$2 

is non-zero. Applying Theorem 12.131 we obtain 

2 

IMIlo = ||c^<i>illo + Wd^ih > 



/i([$l|$2])' 

Since d = ci — C2, we have 

We would also like to mention the very recent paper [39] by Tropp, in which he stud- 
ies uncertainty principles for random sparse signals over an incoherent dictionary. He, in 
particular, shows that the coefficient sequence of each non-optimal expansion of a signal 
contains far more non-zero entries than the one of the sparsest expansion. 



3 Signal Separation 

In this section, we study the special situation of signal separation, where we refer to ID 
signals as opposed to images, etc. For this, we start with the most prominent example of 
separating sinusoids from spikes, and then discuss further problem classes. 

3.1 Separation of Sinusoids and Spikes 

Sinusoidal and spike components are intuitively the morphologically most distinct features 
of a signal, since one is periodic and the other transient. Thus, it seems natural that the 
first results using sparsity and ii minimization for data separation were proven for this 
situation. Certainly, real-world signals are never a pristine combination of sinusoids and 
spikes. However, thinking of audio data from a recording of musical instruments, these 
components are indeed an essential part of such signals. 

The separation problem can be generally stated in the following way: Let the vector 
X E M" consist of n samples of a continuum domain signal at times t £ {0, . . . , n — 1}. We 
assume that x can be decomposed into 

X = Xi+ X2. 
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Here xi shall consist of n samples - at the same points in time as x - of a continuum 
domain signal of the form 



^ n— 1 



Thus, by letting = {^iu))Q<u)<n-i denote the Fourier basis, i.e.. 



091 — ( J_p2vri<^t/r 



0<t<n-l 



the discrete signal xi can be written as 

xi = ^>ici with Ci = {ciu)o<uj<n-l- 

If xi is now the superposition of very few sinusoids, then the coefficient vector ci is sparse. 

Further, consider a continuum domain signal which has a few spikes. Sampling this 
signal at n samples at times t G {0, . . . ,n — 1} leads to a discrete signal X2 G M" which 
has very few non-zero entries. In order to expand X2 in terms of a suitable representation 
system, we let $2 denote the Dirac basis, i.e., ^2 is simply the identity matrix, and write 

X2 = $2C2, 

where C2 is then a sparse coefficient vector. 

The task now consists in extracting xi and X2 from the known signal x, which is il- 
lustrated in Figure [TJ It will be illuminating to detect the dependence on the number of 
sampling points of the bound for the sparsity of ci and C2 which still allows for separation 
via ii minimization. 



'WWVVWW\iW\KA = -VVVXAAAAAAAAAAAAi + 



Figure 1: Separation of artificial audio data into sinusoids and spikes. 

The intuition that ~ from a morphological standpoint - this situation is extreme, can be 
seen by computing the mutual coherence between the Fourier basis <I>i and the Dirac basis 
$2- For this, we obtain 

2]) = (14) 

and, in fact, l/\/n is the minimal possible value. This can be easily seen: If $1 and ^2 are 
two general orthonormal bases of M", then ^^^2 is an orthonormal matrix. Hence the sum 
of squares of its entries equals n, which implies that all entries can not be less than 1/^/n. 

The following result from |19) makes this dependence precise. We wish to mention that 
the first answer to this question was derived in |llj . In this paper the slightly weaker bound 
of (1 + -y/n)/2 for ||ci||o + ||c2||o was proven by using the general result in Theorem 12.11 
instead of the more specialized Theorem 12.61 exploited to derive the result from [19j stated 
below. 
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Theorem 3.1 ( |19j ) Let be the Fourier basis for M" and let $2 be the Dirac basis for 
M" . Further, let x e M" be the signal 

X = xi + X2, where xi = ^>ici and X2 = $2C2, 

with coefficient vectors Ci G M", i = 1, 2. If 

||ci||o + IIC2II0 < (v^ - Q.b)^/n, 

then the ii minimization problem (Sep^) stated in ([3]) recovers c\ and ci uniquely, and hence 
extracts xi and X2 from x precisely. 

Proof. Recall that we have (cf. ([H])) 

/x([$i|$2]) ]^ 



Hence, by Theorem 12.61 the £1 minimization problem (Sep^) recovers ci and C2 uniquely, 
provided that 

^/2-0.5 , r- X /- 

ci + C2 < 1^ 1, = V2 - O.SV^. 

/x([$l|4'2]) 

The theorem is proved. □ 

The classical uncertainty principle states that, roughly speaking, a function cannot both 
be localized in time as well as in frequency domain. A discrete version of this fundamental 
principle was - besides the by now well-known continuum domain Donoho-Stark uncertainty 
principle - derived in |16j . It showed that a discrete signal and its Fourier transform cannot 
both be highly localized in the sense of having 'very few' non-zero entries. We will now 
show that this result - as it was done in [TT] - can be interpreted as a corollary from data 
separation results. 

Theorem 3.2 ( |16] ) Let x G R", and denote its Fourier transform by x. Then 

\\x\\o + \\x\\o > 2-y/n. 

Proof. For the proof, we intend to use Theorem I2.13[ First, we note that by letting <I>i 
denote the Dirac basis, we trivially have 

ll^fa^llo = ||x||o. 



Secondly, letting $2 denote the Fourier basis, we obtain 
Now recalling that, by (fH]) . 



X = ^Ix. 



/i([$l|^>2]) = 



we can conclude from Theorem 12.131 that 

2 

Ikllo + ll^llo = ll^f2;||o + ||^i'a;||o > = 2^/n. 

This finishes the proof. □ 

As an excellent survey about sparsity of expansions of signals in the Fourier and Dirac 
basis, data separation, and related uncertainty principles as well as on very recent results 
using random signals, we refer to [38j . 
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3.2 Further Variations 

Let us briefly mention the variety of modifications of the previous discussed setting, most 
of them empirical analyses, which were developed during the last few years. 

The most common variation of the sinusoid and spike setting is the consideration of a 
more general periodic component, which is then considered to be sparse in a Gabor system, 
superimposed by a second component, which is considered to be sparse in a system sensitive 
to spike-like structures similar to wavelets. This is, for instance, the situation considered in 
[22| . An example for a different setting is the substitution of a Gabor system by a Wilson 
basis, analyzed in [3]. In this paper, as already mentioned in Subsection 12.31 the clustering 
of coefficients already plays an essential role. It should also be mentioned that a specifically 
adapted norm, namely the mixed ii^2 or £2,1 norm, is used in [25] to take advantage of this 
clustering, and various numerical experiments show successful separation. 

4 Image Separation 

This section is devoted to discuss results on image separation exploiting Morphological 
Component Analysis, first focussing on empirical studies and secondly on theoretical results. 

4.1 Empirical Results 

In practice, the observed signal x is often contaminated by noise, i.e., x = xi + X2 + n 
containing the to-be-extracted components xi and X2 and some noise n. This requires 
an adaption of the ii minimization problem. As proposed in numerous publications, one 
typically considers a modified optimization problem - so-called Basis Pursuit Denoising 
- which can be obtained by relaxing the constraint in order to deal with noisy observed 
signals. The i\ minimization problem (Sepg) stated in ([3]), which places the t\ norm on the 
synthesis side then takes the form: 

min ||ci||i + ||c2||i + A||x - <I>iCi - <l>2C2||i 

Cl,C2 

with appropriately chosen regularization parameter A > 0. Similarly, we can consider the 
relaxed form of the ii minimization problem (Sep^) stated in ([6|), which places the ii norm 
on the analysis side: 

min xilli + ||<&^X2||i + A||x — xi — X2\\2- 

Xl,X2 

In these new forms, the additional content in the image ~ the noise characterized by the 
property that it can not be represented sparsely by either one of the two systems <l>i and 
$2, will be allocated to the residual x — ^ici — $2C2 or x — xi — 2:2 depending on which of the 
two minimization problems stated above is chosen. Hence, performing this minimization, 
we not only separate the data, but also succeed in removing an additive noise component 
as a by-product. 

There exist by now a variety of algorithms which numerically solve such minimization 
problems. One large class are, for instance, iterative shrinkage algorithms; and we refer to 
the beautiful new book [18j by Elad for an overview. It should be mentioned that it is also 
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possible to perform these separation procedures locally, thus enabling parallel processing, 
and again we refer to [18] for further details. 

Let us now delve into more concrete situations. One prominent class of empirical studies 
concerns the separation of point- and curvelike structures. This type of problem arises, for 
instance, in astronomical imaging, where astronomers would like to separate stars (pointlike 
structures) from filaments (curvelike structures). Another area in which the separation of 
points from curves is essential is neurobiological imaging. In particular, for Alzheimer 
research, neurobiologists analyze images of neurons, which - considered in 2D - are a 
composition of the dendrites (curvelike structures) of the neuron and the attached spines 
(pointlike structures). For further analysis of the shape of these components, dendrites and 
spines need to be separated. 

From a mathematical perspective, pointlike structures are generally speaking OD struc- 
tures whereas curvelike structures are ID structures, which reveals their morphological 
difference. Thus it seems conceivable that separation using the idea of Morphological Com- 
ponent Analysis can be achieved, and the empirical results presented in the sequel as well 
as the theoretical results discussed in Subsection 14.21 give evidence to this claim. 

To set up the minimization problem properly, the question arises which systems adapted 
to the point- and curvelike objects to use. For extracting pointlike structures, wavelets seem 
to be optimal, since they provide optimally sparse approximations of smooth functions 
with finitely many point singularities. As a sparsifying system for curvelike structures, 
two different possibilities were explored so far. From a historical perspective, the first 
system to be utilized were curvelets [5], which provide optimally sparse approximations of 
smooth functions exhibiting curvilinear singularities. The composed dictionary of wavelets- 
curvelets is used in 

MCALat0, and implementation details are provided in the by now 
considered fundamental paper [35]. A few years later shearlets were developed, see [J^ or 
the survey paper [27], which deal with curvilinear singularities in a similarly favorable way 
as curvelets (cf. [21]), but have, for instance, the advantage of providing a unified treatment 
of the continuum and digital realm and being associated with a fast transform. Separation 
using the resulting dictionary of wavelets-shearlets is implemented and publicly available in 
ShearLatH. For a close comparison between both approaches we refer to [29] - in this paper 
the separation algorithm using wavelets and shearlets is also detailed -, where a numerical 
comparison shows that ShearLab provides a faster as well as more precise separation. 

For illustrative purposes. Figure [2] shows the separation of an artificial image composed 
of points, lines, and a circle as well as added noise into the pointlike structures (points) and 
the curvelike structures (lines and the circle) , while removing the noise simultaneously. The 
only visible artifacts can be seen at the intersections of the curvelike structures, which is not 
surprising since it is even justifiable to label these intersections as 'points'. As an example 
using real data, we present in Figure [3] the separation of a neuron image into dendrites and 
spines again using ShearLab. 

Another widely explored category of image separation is the separation of cartoons 
and texture. Here, the term cartoon typically refers to a piecewise smooth part in the 
image, and texture means a periodic structure. A mathematical model for a cartoon was 

^MCALab (Version 120) is available from http : //jstarck.iree . f r/j starck/Home .html[ 
^ShearLab (Version 1.1) is available from http : / /www . shearlab . org , 
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(a) Original image (b) Noisy image 




(c) Pointlike Component (d) Curvelike Component 

Figure 2: Separation of an artificial image composed of points, lines, and a circle into point- 
and curvelike components using ShearLab. 

first introduced in [8j as a function containing a discontinuity. In contrast to this, 
the term texture is a widely open expression, and people have debated for years over an 
appropriate model for the texture content of an image. A viewpoint from applied harmonic 
analysis characterizes texture as a structure which exhibits a sparse expansion in a Gabor 
system. As a side remark, the reader should be aware that periodizing a cartoon part of an 
image produces a texture component, thereby revealing the very fine line between cartoons 
and texture, illustrated in Figure HI 

As sparsifying systems, again curvelets or shearlets are suitable for the cartoon part, 
whereas discrete cosines or a Gabor system can be used for the texture part. MCALab 
uses for this separation task a dictionary composed of curvelets and discrete cosines, see 
[35j . For illustrative purposes, we display in Figure [5] the separation of the Barbara image 
into cartoon and texture component performed by MCALab. As can be seen, all periodic 
structure is captured in the texture part, leaving the remainder to the cartoon component. 

4.2 Theoretical Results 

The first theoretical result explaining the successful empirical performance of Morphological 
Component Analysis was derived in |14j by considering the separation of point- and curvelike 
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(a) Original image 




(b) Pointlike Component 



(c) Curvelike Component 



Figure 3: Separation of a neuron image into point- and curvelike components using Shear- 
Lab. 



features in images coined the Geometric Separation Problem. The analysis in this paper 
has three interesting features. Firstly, it introduces the notion of cluster coherence (cf. 
Definition 12. 8|) as a measure for the geometric arrangements of the significant coefficients 
and hence the encoding of the morphological difference of the components. It also initiates 
the study of £i minimization in frame settings, in particular those where singleton coherence 
within one frame may be high. Secondly, it provides the first analysis of a continuum model 
in contrast to the previously studied discrete models which obscure continuum elements 
of geometry. And thirdly, it explores microlocal analysis to understand heuristically why 
separation might be possible and to organize a rigorous analysis. This general approach 
applies in particular to two variants of geometric separation algorithms. One is based on 



<^ Q Q Qi Qi Q 
Qi Qi Q Qj Qi Qi Qi 
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Figure 4: Periodic small cartoons versus one large cartoon. 
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(a) Barbara image 




(b) Cartoon Component 



(c) Texture Component 



Figure 5: Separation of the Barbara image into cartoon and texture using MCALab. 

tight frames of radial wavelets and curvelets and the other uses orthonormal wavelets and 
shear lets. 

These results are today the only results providing a theoretical foundation to image 
separation using ideas from sparsity methodologies. The same situation - separating point- 
and curvelike objects - is also considered in |13j however using thresholding as a separation 
technique. Finally, we wish to mention that some initial theoretical results on the separation 
of cartoon and texture in images are contained in |15j . 

Let us now dive into the analysis of ^4j. As a mathematical model for a composition of 
point- and curvelike structures, the following two components are considered: The function 
V on R-^, which is smooth except for point singularities and defined by 



serves as a model for the pointlike objects, and the distribution C with singularity along a 
closed curve r : [0, 1] ^ M2 defined by 



models the curvelike objects. The general model for the considered situation is then the 



p 



3/2 



i=l 



c 
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sum of both, i.e.. 



f = V + C, 



(15) 



and the Geometric Separation Problem consists of recovering V and C from the observed 
signal /. 

As discussed before, one possibility is to set up the minimization problem using an 
overcomplete system composed of wavelets and curvelets. For the analysis, radial wavelets 
are used due to the fact that they provide the same subbands as curvelets. To be more 
precise, let W be an appropriate window function. Then radial wavelets at scale j and 
spatial position k = (ki, are defined by the Fourier transforms 

^^{0 = 2-' ■Wm/2^)-e'^^'^/^'\ 

where A = (j, k) indexes scale and position. For the same window function W and a 'bump 
function' V, curvelets at scale j, orientation and spatial position k = {ki,k2) are defined 
by the Fourier transforms 

%{0 = 2-^1 ■ WmmV{{u: - e,-,)2^V2) . ^^{Re^^,A^-,kYi^ 

where 6j^^ = 27r£/2-'/^, Rq is planar rotation by —6 radians, Aa is anisotropic scaling with 
diagonal (a, i/a), and we let rj = (j, ^, k) index scale, orientation, and scale; see [5] for more 
details. The tiling of the frequency domain generated by these two systems is illustrated in 
Figure El 




(a) Radial wavelets (b) Curvelets 



Figure 6: Tiling of the frequency domain by radial wavelets and curvelets. 

By using again the window W , we define the family of filters Fj by their transfer 
functions 

F,{i) = w{\i\/2^), eeK^. 

These filters provide a decomposition of any distribution g into pieces gj with different 
scales, the piece gj at subband j generated by filtering g using Fj-. 

9j =Fj*9- 

A proper choice of W then enables reconstruction of g from these pieces using the formula 

j 
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Application of this filtering procedure to the model image / from (|15p yields the decompo- 
sitions 

f^=Fj*f = Fj*{V + C)=Vj+q, 

where {fj)j is known, and we aim to extract {'Pj)j and {Cj)j. We should mention at this 
point that, in fact, the pair ('P,C) was chosen in such a way that Vj and Cj have the same 
energy for each j, thereby making the components comparable as we go to finer scales and 
the separation challenging at each scale. 

Let now $i and ^2 be the tight frame of radial wavelets and curvelets, respectively. 
Then, for each scale j, we consider the ii minimization problem (Sep„) stated in ([6]), which 
now reads: 

min + II^I^CjIli s.t. fj = Pj + Cj. (16) 

Notice that we use the 'analysis version' of the minimization problem, since both radial 
wavelets as well as curvelets are overcomplete systems. 

The theoretical result of the precision of separation of fj via ()16p proved in |14j can now 
be stated in the following way: 

Theorem 4.1 ( |14] ) Let Pj and Cj be the solutions to the optimization problem il6\} for 
each scale j. Then we have 

II^.-||2 + ||C,||2 

This result shows that the components Vj and Cj are recovered with asymptotically 
arbitrarily high precision at very fine scales. The energy in the pointlike component is 
completely captured by the wavelet coefficients, and the curvelike component is completely 
contained in the curvelet coefficients. Thus, the theory evidences that the Geometric Sep- 
aration Problem can be satisfactorily solved by using a combined dictionary of wavelets 
and curvelets and an appropriate ii minimization problem, as already the empirical results 
indicate. 

We next provide a sketch of proof and refer to [14j for the complete proof. 

Proof [Sketch of proof of Theorem 14.1] . The main goal will be to apply Theorem 12. 121 
to each scale and prove that the sequence of bounds converges to zero. For this, let 

j be arbitrarily fixed, and apply Theorem 12.121 in the following way: 

• 5: Filtered signal fj (= Vj +Cj). 

• <l>i: Wavelets filtered with Fj. 

• <I>2: Curvelets filtered with Fj. 

• Ai: Significant wavelet coefficients of Vj. 

• A2: Significant curvelet coefficients of Cj. 

• 6j: Degree of approximation by significant coefficients. 

• {fJ'c)j- Cluster coherence of wavelets-curvelets. 
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If 

^— ^-^ = o(r,||2 + ||C,||2) asj^oo (17) 

can be then shown, the theorem is proved. 

One main problem to overcome is the highly delicate choice of Ai and A2. It would be 
ideal to define those sets in such a way that 

= oiWVjh + WCjh) asj^oo (18) 

and 

(/ic)i -^0 as j 00 (19) 

are true. This would then imply (|17|) . hence finish the proof. 

A microfocal analysis viewpoint now provides insight into how to suitably choose Ai 
and A2 by considering the wavefront sets of V and C in phase space x [0, 27r), i.e., 

WF{V) = {x^}fL^ X [0,27r) 

and 

WF{C) = {{T{t),e{t)):t€[0,L{T)]}, 

where T{t) is a unit-speed parametrization of r and 0{t) is the normal direction to r at 
r(i). Heuristically, the significant wavelet coefficients should be associated with wavelets 
whose index set is 'close' to WF{V) in phase space and, similarly, the significant curvelet 
coefficients should be associated with curvelets whose index set is 'close' to WF{C). Thus, 
using Hart Smith's phase space metric, 

dHs{{b,9);{b',e')) = \{eg,b-b')\ + \{ee',b-b')\ + \b-b'\^ + \9-9f, 

where eg = (cos(0), sin(0)), an 'approximate' form of sets of significant wavelet coefficients 
is 

Aij = {wavelet lattice} n {(6,6*) : dHs{{b,0);WF{V)) < r]j2-^}, 
and an 'approximate' form of sets of significant curvelet coefficients is 

Aaj = {curvelet lattice} n {{b,9) : dHs{{h,9);W F{C)) < Vj"^'^} 

with a suitable choice of the distance parameters {rfj )j . In the proof of Theorem 14. H the 
definition of (Aij)j and {A2j)j is much more delicate, but follows this intuition. Lengthy 
and technical estimates then lead to (jlSp and ()19p , which - as mentioned before - completes 
the proof. □ 

Since it was already mentioned in Subsection 14. II that a combined dictionary of wavelets 
and shearlets might be preferable, the reader will wonder whether the just discussed theo- 
retical results can be transferred to this setting. In fact, this is proven in |26j , see also |12j . 
It should be mentioned that one further advantage of this setting is the fact that now a 
basis of wavelets can be utilized in contrast to the tight frame of radial wavelets explored 
before. 
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As a wavelet basis, we now choose orthonormal Meyer wavelets, and refer to |30j for the 
definition. For the definition of shearlets, for j > and k £ 7^, let - the notion was 
already introduced in the definition of curvelets - and be defined by 



1-2 J 



2^j ^^=l0 1 



For (l),ip,ip £ L^(]R^), the cone-adapted discrete shearlet system is then the union of 

{(/>(• -m) : m £ Z^}, 
{2i^^{SkA2j ■ -m) : j > 0,-\2^/^] <k< \2^/^],me I?], 



and 



{2i^{SlA2, . -m) : j 



> 0, - [2^'/^] < A: < [2J'/2] ^ ^ e 



The term 'cone-adapted' originates from the fact that these systems tile the frequency 
domain in a cone-like fashion; see Figure [7bl 



(a) Wavelets 
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t 











(b) Shearlets 



Figure 7: Tiling of the frequency domain by orthonormal Meyer wavelets and shearlets. 

As can be seen from Figure [TJ the subbands associated with orthonormal Meyer wavelets 
and shearlets are the same. Hence a similar filtering into scaling subbands can be performed 
as for radial wavelets and curvelets. 

Adapting the optimization problem ()16p by using wavelets and shearlets instead of 
radial wavelets and curvelets generates purported point- and curvelike objects Wj and Sj, 
say, for each scale j. Then the following result, which shows similarly successful separation 
as Theorem 14.11 was derived in [26j with the new concept of sparsity equivalence, here 
between shearlets and curvelets, introduced in the same paper as main ingredient. 



Theorem 4.2 ([26]) We have 



v-i-Wjh + Wc^-s^ 



v,h + \\c 



ill2 



0, 



oo. 
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